Upstream Steps
This notebook
import numpy as np
import pandas as pd
import seaborn as sns
import igraph as ig
import matplotlib.pyplot as plt
from datetime import datetime
from scipy.sparse import csr_matrix, isspmatrix
import scanpy as sc
import scanpy.external as sce
sc.settings.verbosity = 3
sc.settings.set_figure_params(dpi=80)
#results_file = '/home/..../brainomics/Dati/3_AdataDimRed.h5ad'
results_file = '/group/brainomics/Intermediate/3_AdataDimRed.h5ad'
print(datetime.now())
2022-11-25 19:42:24.002373
adata = sc.read('/group/brainomics/Intermediate/2_AdataNorm.h5ad')
print('Loaded Normalizes AnnData object: number of cells', adata.n_obs)
print('Loaded Normalizes AnnData object: number of genes', adata.n_vars)
print('Available metadata for each cell: ', adata.obs.columns)
Loaded Normalizes AnnData object: number of cells 27457
Loaded Normalizes AnnData object: number of genes 13945
Available metadata for each cell: Index(['Cluster', 'Subcluster', 'Donor', 'Layer', 'Gestation_week', 'Index',
'Library', 'Number_genes_detected', 'Number_UMI',
'Percentage_mitochondrial', 'S_phase_score', 'G2M_phase_score', 'Phase',
'n_genes_by_counts', 'total_counts', 'total_counts_mito',
'pct_counts_mito', 'total_counts_ribo', 'pct_counts_ribo', 'n_genes',
'n_counts'],
dtype='object')
adata_check = adata.copy()
sc.pp.pca(adata_check, n_comps=50, use_highly_variable=True, svd_solver='arpack')
computing PCA
on highly variable genes
with n_comps=50
finished (0:00:01)
sc.pp.neighbors(adata_check, n_neighbors=30, n_pcs=25)
computing neighbors
using 'X_pca' with n_pcs = 25
2022-11-25 19:42:35.169676: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-11-25 19:42:35.295165: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. 2022-11-25 19:42:36.325841: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/R/4.2.1/lib/R/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/.singularity.d/libs 2022-11-25 19:42:36.326018: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/R/4.2.1/lib/R/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/.singularity.d/libs 2022-11-25 19:42:36.326050: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
finished: added to `.uns['neighbors']`
`.obsp['distances']`, distances for each pair of neighbors
`.obsp['connectivities']`, weighted adjacency matrix (0:00:31)
sc.tl.umap(adata_check)
computing UMAP
finished: added
'X_umap', UMAP coordinates (adata.obsm) (0:00:21)
sc.pl.umap(adata_check, color=['Donor', 'Layer'], size=10)
/usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter( /usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
plt.rcParams['figure.dpi'] = 120
sc.pl.umap(adata_check, color=['Donor'], size=10)
/usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
del adata_check
From the UMAP, it is visible a batch effect related to the donor identity, particularly in the region of migrating neurons. We therefore apply a batch correction using Harmony algorithm and
sc.pp.pca(adata, n_comps=50, use_highly_variable=True, svd_solver='arpack')
computing PCA
on highly variable genes
with n_comps=50
finished (0:00:01)
sc.settings.set_figure_params(dpi=80)
sc.pl.pca_variance_ratio(adata)
sc.pl.pca_variance_ratio(adata, log=True)
Harmony uses an iterative clustering approach to align cells from different batches. For each iteration:
References
sc.external.pp.harmony_integrate(adata, 'Donor')
2022-11-25 19:43:34,232 - harmonypy - INFO - Iteration 1 of 10 INFO:harmonypy:Iteration 1 of 10 2022-11-25 19:43:41,392 - harmonypy - INFO - Iteration 2 of 10 INFO:harmonypy:Iteration 2 of 10 2022-11-25 19:43:48,542 - harmonypy - INFO - Iteration 3 of 10 INFO:harmonypy:Iteration 3 of 10 2022-11-25 19:43:55,935 - harmonypy - INFO - Iteration 4 of 10 INFO:harmonypy:Iteration 4 of 10 2022-11-25 19:44:03,160 - harmonypy - INFO - Iteration 5 of 10 INFO:harmonypy:Iteration 5 of 10 2022-11-25 19:44:07,525 - harmonypy - INFO - Iteration 6 of 10 INFO:harmonypy:Iteration 6 of 10 2022-11-25 19:44:10,435 - harmonypy - INFO - Converged after 6 iterations INFO:harmonypy:Converged after 6 iterations
New corrected PC are saved in .obs["X_pca_harmony"]
sc.pl.embedding(adata, basis="X_pca_harmony", color=['Donor'])
/usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
We compute the neighborhood graph of cells using the harmony-corrected PCA representation of the data. This identifies how similar a cell is to another cell, definying cells that are close from those that are not.
This step is propedeutic for UMAP plotting and for clustering.
Key parameters:
sc.pp.neighbors(adata, n_neighbors=80, n_pcs=20, use_rep="X_pca_harmony", key_added="harmony")
#sc.pp.neighbors(adata, n_neighbors=80, n_pcs=15, use_rep="X_pca_harmony", key_added="harmony", metric='cosine')
computing neighbors
finished: added to `.uns['harmony']`
`.obsp['harmony_distances']`, distances for each pair of neighbors
`.obsp['harmony_connectivities']`, weighted adjacency matrix (0:00:17)
sc.tl.umap(adata, neighbors_key="harmony", random_state=1)
computing UMAP
finished: added
'X_umap', UMAP coordinates (adata.obsm) (0:00:29)
sc.tl.diffmap(adata, neighbors_key="harmony")
computing Diffusion Maps using n_comps=15(=n_dcs)
computing transitions
finished (0:00:00)
eigenvalues of transition matrix
[1. 0.99714315 0.99270475 0.99232537 0.9917588 0.9895217
0.9785628 0.9751901 0.97225225 0.9687998 0.9620149 0.9564335
0.944027 0.93468106 0.9310887 ]
finished: added
'X_diffmap', diffmap coordinates (adata.obsm)
'diffmap_evals', eigenvalues of transition matrix (adata.uns) (0:00:02)
#Takes quite some time
sc.tl.draw_graph(adata, neighbors_key="harmony")
drawing single-cell graph using layout 'fa'
finished: added
'X_draw_graph_fa', graph_drawing coordinates (adata.obsm) (0:05:06)
sc.pl.umap(adata, color=['n_genes_by_counts',"total_counts", 'pct_counts_mito', 'pct_counts_ribo'])
sc.settings.set_figure_params(dpi=120)
sc.pl.umap(adata, color=['Donor', 'Cluster'])
/usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter( /usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
sc.settings.set_figure_params(dpi=80)
sc.pl.diffmap(adata, color=['n_genes_by_counts',"total_counts", 'pct_counts_mito', 'pct_counts_ribo'],
components=['2,3'])
sc.settings.set_figure_params(dpi=120)
sc.pl.diffmap(adata, color=['Donor', 'Cluster'], components=['2,3'])
/usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter( /usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
sc.settings.set_figure_params(dpi=80)
sc.pl.draw_graph(adata, color=['n_genes_by_counts',"total_counts", 'pct_counts_mito', 'pct_counts_ribo'])
sc.settings.set_figure_params(dpi=120)
sc.pl.draw_graph(adata, color=['Donor', 'Cluster'])
/usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter( /usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
def selectMarkers(adataObj, mList):
""" From a list of gene names select only the genes that are present in adata.var
"""
#Select markers present in adata
p = adataObj.var_names[adataObj.var_names.isin(mList) == True]
#Keep the same order as input list
p = [x for x in mList if x in p]
#Select missing genes
ab = set(mList).difference(set(adataObj.var_names))
#Print message
if len(ab) == len(mList):
print('\nAll markers are missing')
else:
print('\nThe following marker genes are missing: ', ab)
return(p)
def CustomUmap(adata, genes):
genes = selectMarkers(adata, genes)
sc.pl.umap(adata, color=genes, size=10, frameon=False,
vmin='p1', vmax='p99', layer='lognormcounts')
TopTriku = adata.var.sort_values(by=['triku_distance'], ascending=False).head(16).index
CustomUmap(adata, TopTriku)
The following marker genes are missing: set()
marker_dictionary = {
'Proliferation': ['MKI67', 'CDC20', 'HMGB2', 'CCNB1'],
'Apical Progenitors': ['FABP7', 'GLI3', 'PAX6', 'NES', 'SOX1', 'SOX2', 'SOX9', 'VIM'],
'Intermediate Progenitors': ['EOMES', 'ELAVL4', 'NHLH1', 'KCNQ3'],
'Pan-neuronal': ['DCX', 'STMN2', 'SYT1', 'SYP', 'GAP43', 'MEF2C', 'FOXG1'],
'ExcitatoryNeurons': ['EMX1', 'NEUROD1', 'NEUROD2', 'NEUROD6', 'NEUROG2', 'SLC17A7', 'TBR1', 'BCL11A', 'CUX1', 'CUX2', 'SATB2'],
'Inhibitory Neurons': ['DLX5', 'DLX6', 'DLX6-AS1', 'GAD1', 'GAD2', 'NKX2.1', 'SST', 'NPY', 'CALB1', 'CALB2', 'VIP', 'PVALB'],
'oRG': ['FAM107A', 'HOPX', 'PTPRZ1', 'TNC'],
'Astrocytes': ['GFAP', 'SLC1A3', 'S100B', 'AQP4', 'FGFR3', 'ALDH1L1']
}
for Pop in marker_dictionary:
print("\n\n {}".format(Pop))
CustomUmap(adata, marker_dictionary[Pop])
Proliferation The following marker genes are missing: set()
Apical Progenitors The following marker genes are missing: set()
Intermediate Progenitors The following marker genes are missing: set()
Pan-neuronal The following marker genes are missing: set()
ExcitatoryNeurons The following marker genes are missing: set()
Inhibitory Neurons
The following marker genes are missing: {'CALB1', 'VIP', 'NKX2.1', 'PVALB'}
oRG The following marker genes are missing: set()
Astrocytes
The following marker genes are missing: {'AQP4', 'ALDH1L1'}
adata.write(results_file)
print(datetime.now())
2022-11-25 19:50:37.779670
nb_fname = '3_DimRed'
%%bash -s "$nb_fname"
jupyter nbconvert "$1".ipynb --to="python"
jupyter nbconvert "$1".ipynb --to="html"
[NbConvertApp] Converting notebook 3_DimRed.ipynb to python [NbConvertApp] Writing 8233 bytes to 3_DimRed.py [NbConvertApp] Converting notebook 3_DimRed.ipynb to html [NbConvertApp] Writing 2792611 bytes to 3_DimRed.html